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A Generalized Thurstonian Paired Comparison 
Multicriteria Heuristic Model for Peer Evaluation 
of Individual Performance on IS Team Projects 

Julian M. Scher 
Scher@adm.njit.edu 
Department of Information Systems 
College of Computing Sciences 
New Jersey Institute of Technology 
Newark, New Jersey 07102 USA 


Abstract 

Information Systems instructors are generally encouraged to introduce team projects into 
their pedagogy, with a consequential issue of objectively evaluating the performance of each 
individual team member. The concept of "freeloading" is well-known for team projects, and for 
this, and other reasons, a peer review process of team members, by team members, is often 
advocated. We propose an objective heuristic model for obtaining a scale of individual perfor¬ 
mance, based upon a generalization of Thurstone's Law of Comparative Judgment, where pair 
wise comparisons of team member's performance are elicited with regard to various criteria, 
and we demonstrate how a scale may be obtained to objectively rate the individual members 
of each team. A numerical example is provided to illustrate our Generalized Thurstone model's 
heuristic methodologies. 

Keywords: Thurstone, paired comparison, team projects, peer evaluation, multi-criteria, indi¬ 
vidual performance 


1. INTRODUCTION 

Team projects should be an inherent goal in 
the pedagogy of the IS instructor. The justi¬ 
fication for incorporating team projects has 
been succinctly stated by (Steenkamp, 
2002 ): 

"The rationale is that once students enter 
the work environment they will be required 
to work in teams. Working in a team context 
challenges team members in a number of 
ways, such as: 

• Teams are composed of individuals with 
different technical skills, cultural back¬ 
grounds, behavioral characteristics, cog¬ 
nitive styles and learning abilities. 

• Performance of team members is influ¬ 
enced by the level of teamwork and in¬ 
field experience, knowledge of the appli¬ 
cation domain, pressures of schedule, 
geographical dispersion, full-time or part- 
time study." 


The ABET-CAC accreditation criteria clearly 
specify that an accredited IS program must, 
as part of its objectives, outcomes and as¬ 
sessment, enable all its graduating IS ma¬ 
jors to achieve, by the time of their gradua¬ 
tion "an ability to function effectively on 
teams to accomplish a common goal" (ABET, 
2008). The issue of proper evaluation of 
individual effort in a team project is a con¬ 
current dilemma for the IS instructor, who, 
for instance, needs to be cognizant of the 
often-encountered "freeloading" which oc¬ 
curs in team projects (Fox, 2002). Being 
able to distinguish, identify and measure 
individual contributions and excellence on a 
team effort is vital to the success of the pe¬ 
dagogical exercise. The traditional practice 
of having the instructor review the results of 
the team project, and then award the iden¬ 
tical grade to all members of the team, is 
problematic, and, as pointed out in (Tu, Lu, 
2004), actually encourages and provides 
incentives for some of the weaker students 
to "freeload." While this paper only ad- 
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dresses a peer evaluation methodology to 
measure relative individual performance in a 
group project, much has been written in 
terms of the pedagogical issues relating to 
optimizing the group project experience for 
students, and the reader is encouraged to 
peruse the research recommendations by Tu 
and Tu(2004b). 

One strategy for evaluation of individual per¬ 
formance on project and client teams is that 
of peer evaluation, where individual students 
anonymously rate each other (Lewis, 2006). 
Several methodologies have been suggested 
for the measurement of individual student 
performance on teams (Ruble, Hernandez 
and Amadio, 2004), though there is no uni¬ 
versal agreed-upon standard. (Tu, Lu, 2004) 
discuss the issue of truthfulness in peer 
evaluation rankings of team members, and 
offer a methodology in peer evaluation so 
that truth-telling becomes the dominant 
strategy of individual team members. (Kel¬ 
ley, Sadowski, 2005) found that teams using 
a peer evaluation instrument in an engineer¬ 
ing design graphics course team project 
functioned better than teams not using a 
peer evaluation instrument. (Lewis, 2006) 
suggests that the peer evaluations be done 
on a weekly basis. On the other hand, there 
are researchers who have raised doubts as 
to the usefulness of peer assessments in 
group projects, such as (Kennedy, 2005), 
who questions the underlying value of peer 
assessments, based on his experience with 
group projects in university computing 
courses. 

In this paper, we shall present a model for a 
peer evaluation of individual team member 
performance, based upon some generaliza¬ 
tions and extensions of the classic Thur- 
stone's Law of Comparative Judgment. 

The "law" Thurstone created is essentially a 
measurement model, which requires sub¬ 
jects to make a preference comparison be¬ 
tween each of a number of pairs of stimuli 
with regard to the magnitude of a property, 
attribute, or attitude. (Thurstone, 1927, 
1929, 1959). 

2. A GENERALIZED THURSTONE 
MODEL 

Let us assume we have n + 1 students on a 
team, which the instructor has divided the 
class into project teams or client teams for a 


particular assignment, or set of assign¬ 
ments. As part of the evaluation process to 
measure individual student's performance on 
his/her team, each individual student will be 
asked to pairwise compare each of the other 
n students on his/her team according to a 
specific set of criteria chosen by the instruc¬ 
tor, the number of such criteria to be de¬ 
noted by m. 

Then, since a given student will not be asked 
to vote in any pair wise comparison involv¬ 
ing himself/herself, there will be n*(n-l)/2 
paired comparisons between different stu¬ 
dents, for a specific criteria r. 

Thurstone (1927) presented a conceptual 
model for paired comparisons based upon 
several assumptions: 

1. When a stimuli pair is presented to a sub¬ 
ject, it will elicit a continuous preference (re¬ 
ferred to as a "discriminal process") for each 
stimulus. 

2. The one stimulus whose value is greater 
at the moment of the comparison will be the 
one that is preferred by the subject. 

3. The aforementioned preferences are nor¬ 
mally distributed in the population. 

4. It is assumed that each individual will re¬ 
spond to all of the possible paired compari¬ 
sons. 

Classic Thurstone analysis requires that after 
the individual students do their pairwise 
comparisons of other students for each of 
the m criteria, the results be presented in 
terms of a frequency matrix F, where f(i,j) 
denotes the count of those who prefer al¬ 
ternative i to alternative j, where in our case 
i and j are individual students. 

We will extend the original Thurstone model 
of paired comparisons by assuming that the 
students will be evaluating each other based 
upon a set of criteria established by the in¬ 
structor. The establishment of evaluation 
criteria for class projects by the instructor is 
fairly typical in universities - for instance, 
Chandra (http://tinyurl.com/nuw6ns) utilizes 
the following 5 criteria in evaluating 
projects: 

• Depth and breadth of research 

• Subject Knowledge 

• Project Presentation Quality 
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• Final Project Report 

• Original or New Contributions 

r 

We will denote by F the frequency matrix 
for criteria r, where r = Thus, if we 

have m different criteria which we wish to 
use to evaluate members of the team, we 
will then have a total of m frequency matric¬ 
es. 

r 

The individual elements f (i, j) of the fre- 
r 

quency matrix F will be the count of those 
who "prefer" student i over student j accord¬ 
ing to the r'th criteria. 

Quantifying this, we define Xjj(k), the "rat¬ 
ing" by student k in terms of his/her prefe¬ 
rence of student i to student j, for each of 
the r criteria, where k <> (i,j) and 

k = l,...,n+1; 

xi,j(k) = 

1 if i is preferred to j by student k 
0 if j is preferred to i by student k 
where we have the constraints that 

xjj(k) + xj ; j(k) = 1 for i <> j and 

x i,i(k) = 0 for k = l,...n+l . 

We also seek transitivity in student k's rat¬ 
ing, i.e., if x a ,b(k) = 1 and xb ;C (k) = 1, 

then x a;C (k) = 1. Also, we insist that each 
student be required to make every compari¬ 
son, without having any 'indifferent' votes. 

r 

The elements f (i, j) of the frequency matrix 
r 

F are then computed as follows: 
n+1 

f r (i, j) = £ Xjj(k) for i,j = 1,..., n+1 
k=l 

To illustrate these preliminary concepts with 
some numerical data, suppose that we have 
a team of n+1 or 5 students, who we will 
denote by SI. S2. S3. S4 and S5. 

Let us assume that we desire the frequency 
r 

matrix F for a particular criteria (for in¬ 
stance, r=l), and have queried the students 


to obtain the following pairwise compari¬ 
sons: 

Let student l's ratings be as follows: 

X 2 ,3 (1) = 1 

X 2 ,4 (1) = 1 

X 2 , 5 (1) = 0 
X 3 ,4d) = 1 

X 3 , 5 (1) = 0 
x 4 ,s (1) = 0 

Let student 2's ratings be as follows: 

x i, 3 (2) = 0 

x2) = 0 
xi, 5 (2) = 0 

X 3 ,4 (2) = 1 

X 3 ,5 (2) = 1 
X 4 ,5 (2) = 1 

Let student 3's ratings be as follows: 

xi, 2 (3) = 0 
x 1;4 (3) = 0 
xi, 5 (3) = 0 
x 2 , 4 (3) = 0 
X 2 ,5 (3) = 1 
x 4 ,5<3) = 0 

Let student 4's ratings be as follows: 

Xi, 2 (4) = 0 
xi, 3 (4) = 0 
x i, 5 (4) = 0 
x 2 , 3 ( 4 ) = 1 
X 2 ,5 ( 4 ) = 1 
x 3 , 5 (4) = 1 

Let student 5's ratings be as follows: 

xi, 2 (5) = 0 
x 1i3 (5) = 0 
xi, 4 (5) = 0 
X 2 ,3 (5) = 1 
x 2 ,4<5) = 0 
x 3;4 (5) = 0 

r 

We then generate the frequency matrix F as 
presented in Figure 1, noting that we obtain 
the remaining elements in the frequency 
r 

matrix F by using the fact that 
x i,j( k )+ x j,i(k) = 1. 


r 

Figure 1: The Frequency Matrix F 



SI 

S2 

S3 

S4 

S5 

SI 

_ 

0 

0 

0 

0 

S2 

3 

- 

3 

1 

2 

S3 

3 

0 

_ 

2 

2 

S4 

3 

2 

1 

_ 

1 

S5 

3 

1 

1 

2 
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The elements in this frequency matrix are 

r 

computed by the relationship between f (i, j) 
and Xjj(k), namely 
n + l 

f r (i, j) = 2 xjj(k) for i,j = 1,..., n+l 

k=l 

Thus, for instance, 

f (3,4)= x 3<4 (l) + x 3 , 4 (2) + x 3j4 (5) 

= 1 + 1 + 0 = 2 

Also, we need to satisfy f(i,j)+f(j,i) = n-1, 
that is, the number of pair-wise comparisons 
done by the group for any two students will 
be n-1. 

With a team of n+l students, the number of 
paired comparisons we ask of each student, 
for each criteria, as previously stated, is 
n*(n-l)/2. For our illustrative example, with 
n+l or 5-student teams, this involves 4*3/2 
or 6 paired comparisons of fellow student 
teammates for each of the m criteria. The 
total number of paired comparisons for each 
of the k students will therefore be m*n*(n- 
l)/2. Typical student team sizes, such as 
client teams, are often between 4 and 5, and 
if the instructor seeks to keep the number of 
different criteria to a small number, such as 
3 or 4, then the total number of paired com¬ 
parisons required of each student in the peer 
evaluation will be 24 or less. 

The second phase of the Thurstone model 
will be the transformation of the frequency 
r r 

matrices F into Probability matrices P , 

r 

where P denotes the Probability matrix for 
the r'th criteria, where r = l,...,m. 

If there are n+l students, each individual 
student being asked to make paired compar¬ 
isons involving each pair of the other n stu¬ 
dents, then there will be n*(n-l)/2 paired 
comparisons, and the number of students 
making a paired comparison between i and j, 
i.e., f(i,j), will be (n+l)-2, or (n-1) since we 
omit the two specific students i and j who do 
not make paired comparisons involving 
themselves. The elements of the probability 
r 

matrix, denoted by p (i,j), are then com¬ 
puted as follows: 

P r (i,j) = Ai, j) / (n-1) 


We also compute, for each row k in P r , the 
sum of the probabilities in row k, which we 
denote by V k (k = l,...,n+l). 

r 

The resulting Probability matrix P is given in 
Figure 2. 


r 

Figure 2: The Probability Matrix P 



SI 

S2 

S3 

S4 

S5 

Vk 

SI 

- 

0 

0 

0 

0 

0.00 

S2 

1.0 

- 

1.0 

.33 

.67 

3.00 

S3 

1.0 

0 

- 

.67 

.67 

2.33 

S4 

1.0 

.67 

.33 

- 

.33 

2.33 

S5 

1.0 

.33 

.33 

.67 

- 

2.33 


Following the computation of the Probability 
matrix, a new matrix is then computed, tra¬ 
ditionally called X in the psychometric litera¬ 
ture, but for our nomenclature we will refer 
to it as the Z matrix. The cell values of ma¬ 
trix Z are the standardized normal deviates 
corresponding to the probabilities given in 
r 

matrix P . Thurstone's Law of Comparative 
Judgment prescribes that the scale value 
difference between any two stimuli in a 
paired comparison assessment is a random 
variable following a Normal (Gaussian) prob¬ 
ability density function. The mean value of 
this Normal distribution represents the scale 
value difference between the two stimuli in 
question. 

r 

Transforming the Probability Matrix P into 

r 

the standardized Normal Matrix Z , where 
the Z(i,j) values are computed from the 
N(0,1) tables, and we have approximated 
the Gaussian distribution with a doubly trun¬ 
cated normal distribution having truncation 
endpoints at -3 and +3, corresponding to 
CDF(O) and CDF(l), where CDF(x) 
represents the cumulative distribution func¬ 
tion at point x. While one may obtain precise 
values for the doubly truncated normal dis¬ 
tribution (Johnson and Thermopolous, 
2002), there will be no harm in approximat¬ 
ing these values by the more widely accessi¬ 
ble standardized Normal tables. Invoking 
the standardized normal tables will yield the 
Z r matrix of Figure 3. 
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Figure 3: The Standardized Normal 
Matrix Z f (for r=l) 



SI 

S2 

S3 

S4 

S5 

T k (r) 

SI 

_ 

-3 

-3 

-3 

-3 

-12 

S2 

3 

_ 

3 

-.43 

.43 

6 

S3 

3 

-3 

- 

.43 

.43 

0.86 

S4 

3 

.43 

-.43 

_ 

-.43 

2.57 

S5 

3 

-.43 

-.43 

.43 

- 

2.57 


The last column, T|<(r), of Figure 3 
represents the sum of the k'th row's stan¬ 
dardized normal values for the r'th criteria, 
and so, for each of the m criteria, we have a 
vector T with k components. 

For each of the m criteria, the instructor will 
assign a weight given by wj, where 

j = and the wj are non-negative 

and 

2 wj = 1 (i.e., the weights constitute a 
convex combination). 

Once we compute the T|<(r) values for all 
criteria r (r = l,...,m) for each of the k stu¬ 
dents (k = l,...,n+l), we may then compute 

the scale values A|< as follows: 

m 

A|< = 2 Wj T|<(r) for k = l,...,n + l 
J = 1 

To illustrate the computations of the scale 

values A|< , let us assume that we have 3 
criteria (i.e., m =3) assigned by the instruc¬ 
tor: 

Criteria(l) = overall quality of work contri¬ 
buted 

Criteria(2) = availability and willingness to 
work with other team members and support 
the work of the team 

Criteria(3) = perceived amount of effort 

The instructor believes that criteria(l), the 
overall quality of work contributed, is twice 
as important as either of the other two crite¬ 
ria, and that criteria(2) and criteria(3) are 


equal in importance, which leads us to the 
following weights: 

W 4 = . 5 

W 2 = .25 

w 3 = .25 

For simplicity of presentation, we will as¬ 
sume that the previously computed Z matrix 
was the one generated for criteria(l), and so 


T1 (1) = 

-12.0 

T 2 (1) = 

6.0 

T 3 (1) = 

0.861 

t 4 (1) = 

2.5692 

t 5 (1) = 

2.5692 


We will provide data values for T|<(2) and 

Tk(3), (for k=l,...5) and not bother the 
reader with the background details / compu¬ 
tations for the associated Z, P and F matric¬ 
es. 

So, let us assume we have for criteria(2)'s 
T|<(r) values 

T]_ (2) = -5.4 
T 2 (2) =4.31 
T 3 (2) = -1.8 
T 4 (2) = 1.69 and 
T 5 (2) =1.2 

For criteria(3)'s values, we have 

TT (3) = -2.7 
T 2 (3) = 1.2 
T 3 (3) = -.9 
T 4 (3) = 1.0 and 
T 5 (3) = 1.4 

The "T" matrix will then be: 


-12.0 

6.000 

.8616 

2.5692 

2.5692 

-5.4 

4.31 

-1.8 

1.69 

1.2 

-2.7 

1.2 

-0.9 

1.0 

1.4 


and 


Ai = .5*(-12) + .25*(-5.4) + ,25*(-2.7) 
Al = -8.025 
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A 2 = .5*(6.0) + ,25*(4.31) +.25*(1.2) 

A 2 = 4.3775 

A 3 = ,5*(.8616) + .25*(-1.8) +.25*(-.9) 

A 3 = -.2442 

A 4 =.5*(2.5692) +.25*(1.69) +.25*(1) 

A 4 = 1.9571 

A 5 =.5*(2.5692) +.25*(1.2) +.25*(1.4) 

A 5 = 1.9346 

Plotting these points on a scale, we obtain 
Figure 4. 


Figure 4: The 5-Student Peer Evaluation 
Thurstonian Scale of Individual Perfor¬ 
mance On A Team 



Clearly, Student #1 is the least preferred 
student on the team, as judged by the peer 
evaluation of the team, and by a significant 
degree. Student #3 is second in the least 
preferred category, as evaluated by his/her 
peers. Student #2's performance was recog¬ 
nized as being the best on the team. After 
Student #2, we have Student #4 and Stu¬ 
dent #5 coming relatively close in the peer 
evaluation, with Student #4 barely edging 
out Student #5. 

The objective measures we have thus ob¬ 
tained will guide and support the instructor 
in the difficult task of assigning grades to 
individual members of this team of five stu¬ 
dents. While there will be some subjectivity 
on the part of the instructor in his/her inter¬ 
pretation of these results, our inclination 
would be to reward Student 2 with the high¬ 
est grade, an "A," on his/her individual per¬ 
formance on the team. Student #1 would be 
awarded the lowest grade, an "F," for a per¬ 
formance that was clearly recognized as de¬ 
ficient by his/her teammates. Since student 
#4 and student #5 were viewed positively 
for their near-equivalent performance, they 


each should be awarded identical grades, 
very likely a grade of ”B." For student #3, 
whose individual performance was viewed as 
slightly negative in the Thurstonian compar¬ 
ative evaluation, we would be generous and 
award him/her with a ("gentlemen's") "C." 

3. CONCLUSIONS 

We have presented a generalized heuristic 
Thurstonian model for use in peer evaluation 
of individual performance on team projects. 
It is based on paired comparisons of individ¬ 
ual students, whereby each student will 
compare pairs of other students according to 
instructor-selected criteria, and the instruc¬ 
tor will also select the relative importance of 
each criteria. Classical Thurstonian concepts 
are utilized and extended to produce an 
apropos scale where instructors may review 
the relative performance of individual stu¬ 
dents on teams, and observe the resultant 
scales. 
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